Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
debakarr
GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 4 - Clustering/K-Means Clustering/[R] K-Means Clustering.ipynb
1009 views
Kernel: R

K-Means Clustering

Data preprocessing

# Importing the dataset dataset = read.csv('Mall_Customers.csv') X = dataset[4:5]

Using the elbow method to find the optimal number of clusters

set.seed(12) wcss = vector() #Within Cluster Sum of Square for (i in 1:10) wcss[i] = sum(kmeans(X, i)$withinss) plot(1:10, wcss, type = 'b', main = paste('Elbow Method'), xlab = 'Number of clusters', ylab = 'WCSS')
Image in a Jupyter notebook

From the Elbow method we can see that the optimal cluster number is 5 for the given dataset


Applying K-Means to the Mall dataset

set.seed(123) kmeans = kmeans(x = X, centers = 5, iter.max = 300, nstart = 10) y_kmeans = kmeans$cluster

Visualising the clusters

library(cluster) clusplot(X, y_kmeans, lines = 0, shade = TRUE, color = TRUE, labels = 5, plotchar = FALSE, span = TRUE, main = paste('Clusters of customers'), xlab = 'Annual Income', ylab = 'Spending Score') # for more see help(clusplot.default)
Image in a Jupyter notebook

The target customers should be the one with High Earning and High Spend. Here the datapoints inside cluster 1 are the customers with High Earning and High Spend